Search CORE

834 research outputs found

Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation

Author: Gao Lixin
Gao Qixin
Wang Cuirong
Zhang Yanfeng
Publication venue
Publication date: 16/10/2017
Field of study

Myriad of graph-based algorithms in machine learning and data mining require parsing relational data iteratively. These algorithms are implemented in a large-scale distributed environment in order to scale to massive data sets. To accelerate these large-scale graph-based iterative computations, we propose delta-based accumulative iterative computation (DAIC). Different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, DAIC updates the result by accumulating the "changes" between iterations. By DAIC, we can process only the "changes" to avoid the negligible updates. Furthermore, we can perform DAIC asynchronously to bypass the high-cost synchronous barriers in heterogeneous distributed environments. Based on the DAIC model, we design and implement an asynchronous graph processing framework, Maiter. We evaluate Maiter on local cluster as well as on Amazon EC2 Cloud. The results show that Maiter achieves as much as 60x speedup over Hadoop and outperforms other state-of-the-art frameworks.Comment: ScienceCloud 2012, TKDE 201

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Quantifying AS Path Inflation by Routing Policies

Author: Gao Lixin
Gao Qixin
Wang Feng
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2016
Field of study

A route in the Internet may take a longer AS path than the shortest AS path due to routing policies. In this paper, we systematically analyze AS paths and quantify the extent to which routing policies inflate AS paths. The results show that AS path inflation in the Internet is more prevalent than expected. We first present the extent of AS path inflation observed from the RouteView and RIPE routing tables. We then employ three common routing policies to show the extent of AS path inflation. We find that No-Valley routing policy causes the least AS path inflation among the three routing policies. PreferCustomer-and-Peer-over-Provider policy causes the most AS path inflation. In addition, we find that single-homed stub ASes experience more path inflations than transit ASes and multi-homed ASes. The AS pairs with shortest AS path of 3 AS hops experience more path inflations than other AS pairs. Finally, we investigate the AS path inflation on the end-to-end path from end users to two popular content providers, Google and Comcast. Although the majority of the shortest AS paths from end users to the two providers consists of no more than three AS hops, the actual end-to-end paths that the traffic will take are longer than the shortest AS paths in many cases. Quantifying AS path inflation in the Internet has important implications on the extent of routing policies, traffic engineering performed on the Internet, and BGP convergence speed

ScholarWorks@UMass Amherst

CSD: Discriminance with Conic Section for Improving Reverse k Nearest Neighbors Queries

Author: Bai Mingyuan
Gao Junbin
Li Yang
Liu Gang
Ming Zi
Ye Lixin
Publication venue
Publication date: 18/05/2020
Field of study

The reverse

k

nearest neighbor (R

k

NN) query finds all points that have the query point as one of their

k

nearest neighbors (

k

NN), where the

k

NN query finds the

k

closest points to its query point. Based on the characteristics of conic section, we propose a discriminance, named CSD (Conic Section Discriminance), to determine points whether belong to the R

k

NN set without issuing any queries with non-constant computational complexity. By using CSD, we also implement an efficient R

k

NN algorithm CSD-R

k

NN with a computational complexity at

O(k^{1.5}\cdot log\,k)

. The comparative experiments are conducted between CSD-R

k

NN and other two state-of-the-art RkNN algorithms, SLICE and VR-R

k

NN. The experimental results indicate that the efficiency of CSD-R

k

NN is significantly higher than its competitors

arXiv.org e-Print Archive

The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems

Author: Gao Wanling
Jia Zhen
Shi Yingjie
Wang Lei
Zhan Jianfeng
Zhang Lixin
Zhou Runlin
Zhu Chunge
Publication venue
Publication date: 30/07/2013
Field of study

Now we live in an era of big data, and big data applications are becoming more and more pervasive. How to benchmark data center computer systems running big data applications (in short big data systems) is a hot topic. In this paper, we focus on measuring the performance impacts of diverse applications and scalable volumes of data sets on big data systems. For four typical data analysis applications---an important class of big data applications, we find two major results through experiments: first, the data scale has a significant impact on the performance of big data systems, so we must provide scalable volumes of data sets in big data benchmarks. Second, for the four applications, even all of them use the simple algorithms, the performance trends are different with increasing data scales, and hence we must consider not only variety of data sets but also variety of applications in benchmarking big data systems.Comment: 16 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

A core stateless bandwidth broker architecture for scalable support of guaranteed services

Author: Lixin Gao
Y.T. Hou
Zhenhai Duan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Making Networks Robust to Component Failures

Author: Daniel Gyllstrom
Lixin Gao Member
Lori A. Clarke
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2014
Field of study

In this thesis, we consider instances of component failure in the Internet and in networked cyber-physical systems, such as the communication network used by the modern electric power grid (termed the smart grid). We design algorithms that make these networks more robust to various component failures, including failed routers, failures of links connecting routers, and failed sensors. This thesis divides into three parts: recovery from malicious or misconfigured nodes injecting false information into a distributed system (e.g., the Internet), placing smart grid sensors to provide measurement error detection, and fast recovery from link failures in a smart grid communication network. First, we consider the problem of malicious or misconfigured nodes that inject and spread incorrect state throughout a distributed system. Such false state can degrade the performance of a distributed system or render it unusable. For example, in the case of network routing algorithms, false state corresponding to a node incorrectly declaring a cost of 0 to all destinations (maliciously or due to misconfiguration) can quickly spread through the network. This causes other nodes to (incorrectly) route via the misconfigured node, resulting in suboptimal routing and network congestion. We propose three algorithms for efficient recovery in such scenarios and evaluate their efficacy. The last two parts of this thesis consider robustness in the context of the electric power grid. We study the use and placement of a sensor, called a Phasor Measurement Unit (PMU), currently being deployed in electric power grids worldwide. PMUs provide voltage and current measurements at a sampling rate orders of magnitude higher than the status quo. As a result, PMUs can both drastically improve existing power grid operations and enable an entirely new set of applications, such as the reliable integration of renewable energy resources. However, PMU applications require correct (addressed in thesis part 2) and timely(covered in thesis part 3) PMU data. Without these guarantees, smart grid operators and applications may make incorrect decisions and take corresponding (incorrect) actions. The second part of this thesis addresses PMU measurement errors, which have been observed in practice. We formulate a set of PMU placement problems that aim to satisfy two constraints: place PMUs near each other to allow for measurement error detection and use the minimal number of PMUs to infer the state of the maximum number of system buses and transmission lines. For each PMU placement problem, we prove it is NP-Complete, propose a simple greedy approximation algorithm, and evaluate our greedy solutions. In the last part of this thesis, we design algorithms for fast recovery from link failures in a smart grid communication network. We propose, design, and evaluate solutions to all three aspects of link failure recovery: (a) link failure detection, (b) algorithms for pre-computing backup multicast trees, and (c) fast backup tree installation. To address (a), we design link-failure detection and reporting mechanisms that use OpenFlow to detect link failures when and where they occur inside the network. OpenFlow is an open source framework that cleanly separates the control and data planes for use in network management and control. For part (b), we formulate a new problem, Multicast Recycling, that pre-computes backup multicast trees that aim to minimize control plane signaling overhead. We prove Multicast Recycling is at least NP-hard and present a corresponding approximation algorithm. Lastly, two control plane algorithms are proposed that signal data plane switches to install pre-computed backup trees. An optimized version of each installation algorithm is designed that finds a near minimum set of forwarding rules by sharing forwarding rules across multicast groups. This optimization reduces backup tree install time and associated control state. We implement these algorithms using the POX open-source OpenFlow controller and evaluate them using the Mininet emulator, quantifying control plane signaling and installation time

CiteSeerX

ScholarWorks@UMass Amherst